home *** CD-ROM | disk | FTP | other *** search
-
-
-
- xxvii
-
-
- APPENDIX III - INSTRUCTION SPEED AND FLAGS
-
-
- This appendix contains a list of all the 8086 instructions along
- with their relatives speed and the flags they affect. These speed
- numbers are from INTEL and are in clock ticks.{1} If you have a
- 25 mhz clock, then it is doing 25 million ticks per second. If
- you have a 4.77 mhz clock, then it is doing 4.77 million ticks
- per second. Instead of calling them ticks, I'll be calling them
- clocks.
-
- In order to understand these numbers, you need a little extra
- information. In order to do an instruction, ANY microprocessor
- must:
-
- (1) calculate any memory addresses.
- (2) fetch the two operands (or the single operand if it is
- an instruction like NOT).
- (3) process the instruction and
- (4) put the result in the destination.
-
- While (3) will require the same amount of time whether the
- operands are in memory or in registers, (1), (2) and (4) are all
- different depending on where the operands are. Some machines
- allow both operands to be in memory, but the 8086 does not.
- Therefore, (1) is either NO memory addresses (if everything is in
- registers) or ONE memory address.
-
- Calculating a memory address involves (a) calculating the offset
- from the beginning of the segment and (b) adding it to the
- segment starting address. Part (a) is different depending on the
- addressing mode. In terms of the speed of calculating (a) and
- (b), the order is:
-
- (i) one pointer
- (ii) named variable (+ constant)
- (iii) two pointers
- (iv) one pointer + constant
- (v) two pointers + constant
-
- The reason that (ii) contains both "variable" and "variable +
- constant" is that the addition (variable + constant) is done by
- the assembler, not the 8086. By the time the 8086 sees it, it is
- a single number. The constants in both (iv) and (v) need to be
- added by the 8086, and this takes extra time. According to INTEL,
- here is the time required for calculating all possible memory
- addresses. Note that some pointers are marginally faster than
- other pointers (this is trivial - don't worry about it).
-
- ____________________
-
- 1 All the speed numbers are from "Programmer's Pocket
- Reference Guide", (c)1980, 1982 Intel Corporation.
-
- ______________________
-
- The PC Assembler Tutor - Copyright (C) 1990 Chuck Nelson
-
-
-
-
- The PC Assembler Tutor xxviii
- ______________________
-
-
- ADDRESSING MODE CLOCKS (EA)
- (i) [bx], [si], [di], [bp] 5
- (ii) variable (+ constant) 6
- (iii) [bp+di] or [bx+si] 7
- [bp+si] or [bx+di] 8
- (iv) ([bx] or [si] or [di] or [bp]) + constant 9
- (v) ([bp+di] or [bx+si]) + constant 11
- ([bp+si] or [bx+di]) + constant 12
-
- The most complicated memory address takes 2.4 times longer to
- calculate than the simplest address. These calculation times will
- be noted by EA (calculate Effective Address). Remember, if both
- operands are in registers, this calculation does not have to be
- done.
-
- In order for you to see how all of this works, we'll use ADD as
- an example. Don't start using the table till you understand this
- example.
-
- On the 8086, we normaly have the following possibilities for
- source and destination:
-
- register, register
- register, memory
- memory, register
- register, constant
- memory, constant
-
- In Appendix II, we simply combined them as:
-
- reg/mem, reg/mem
- reg/mem, constant
-
- We can't do this here because they all have different times. For
- ADD they are:
-
- ADD CLOCKS
- register, register 3
- register, memory 9 + EA
- memory, register 16 + EA
- register, constant 4
- memory, constant 17 + EA
-
- Notice how much faster using a register is. The EA stands for
- "calculate Effective Address" and is the number from the list
- above. If you have:
-
- add ax, bx
-
- that is "register, register", and it will take 3 clocks to
- execute. If you have:
-
- add [bx+di+9], 17
-
- that is "memory, constant" and will take 17 + EA. What is EA
- here? According to the above list, [BX+DI+CONSTANT] takes 12
-
-
-
-
- Appendix III - Speeds and Flag Settings xxix
- _______________________________________
-
- cycles, so 17 + EA is 17 + 12 is 29, so this will take 29 clocks.
- That's right. The one instruction is almost 10 times slower than
- the other. If you can move things into some registers, do a
- number of calculations, and then move them back to memory, you
- can save a lot of time. Let's do a few examples to make sure that
- you see all of them:
-
-
- INSTRUCTION TYPE TIME
-
- add variable1, bl memory, register 16 + EA = 22
- add bl, variable1 register, memory 9 + EA = 15
- add [si], di memory, register 16 + EA = 21
- add di, [si] register, memory 9 + EA = 14
-
- Examples 1 and 2 are the same except that source and destination
- have been switched. The same applies to examples 3 and 4. Notice
- that when the 8086 has to fetch the variable and then put the
- result back in memory, it is significantly slower than when it
- just gets the variable from memory and puts the result in a
- register.
-
-
- INSTRUCTION TYPE TIME
-
- add ax, cx register, register 3
- add di, 1876 register, constant 4
- add variable1, 199 memory, constant 17 + EA = 23
-
- To show you how the different types of EA effect the time, let's
- do all 3 types of "source, destination" that involve memory.
- First, "memory, register":
-
-
- INSTRUCTION TIME
-
- add [bx], ax 16 + EA = 21
- add variable1, ax 16 + EA = 22
- add [bp+di], ax 16 + EA = 23
- add [bp+si], ax 16 + EA = 24
- add [bx+9], ax 16 + EA = 25
- add [bp+di+294], ax 16 + EA = 27
- add [bx+di+294], ax 16 + EA = 28
-
- Now let's do the same things but go "register, memory":
-
- INSTRUCTION TIME
-
- add ax, [bx] 9 + EA = 14
- add ax, variable1 9 + EA = 15
- add ax, [bp+di] 9 + EA = 16
- add ax, [bp+si] 9 + EA = 17
- add ax, [bx+9] 9 + EA = 18
- add ax, [bp+di+294] 9 + EA = 20
- add ax, [bx+di+294] 9 + EA = 21
-
-
-
-
-
-
- The PC Assembler Tutor xxx
- ______________________
-
- And finally we have "memory, constant":
-
- INSTRUCTION TIME
-
- add [bx], 177 17 + EA = 22
- add variable1, 177 17 + EA = 23
- add [bp+di], 177 17 + EA = 24
- add [bp+si], 177 17 + EA = 25
- add [bx+9], 177 17 + EA = 26
- add [bp+di+294], 177 17 + EA = 28
- add [bx+di+294], 177 17 + EA = 29
-
-
- Is this everything you need to know before looking at the list?
- Not quite. Most of the 8086 family has a 16 bit data bus. That
- means that there are 16 wires connecting the processor to memory,
- and the processor reads 1 word (2 bytes) at a time. These memory
- reads ALWAYS start at an even location 1472d, 88026d, 198752d,
- etc. If you are reading one byte, it makes no difference whether
- it is at an even or odd location. If you are reading a word at an
- even location, then everything is normal. If you are reading a
- word at an ODD location, however, the processor must:
-
- 1) start reading at the first even location that contains
- the variable.
- 2) read the next even location (which contains the last part
- of the variable).
- 3) join the parts together.
-
- As an example, let's take a word at address 21957 (i.e. 21957-
- 21958). The processor will:
-
- 1) read the high byte from the word at 21956
- 2) read the low byte from the word at 21958
- 3) join them together. It now has 21957-21958.
-
- The processor can do this, but it takes extra time (4 extra clock
- ticks), so our speed listing will also contain the following
- notice:
-
- WORDS WHICH ARE AT ODD ADDRESSES NEED 4 EXTRA CLOCKS
-
- Thus for:
-
- add ax, variable1 (9 + EA = 15)
-
- if the address is an even location, the instruction will require
- 15 clocks. If it is an odd location, the instruction will require
- 19 clocks (4 extra ticks). It is worth your while to keep words
- at even locations if at all possible.
-
-
-
-
-
-
-
-
-
-
-
- Appendix III - Speeds and Flag Settings xxxi
- _______________________________________
-
-
- INSTRUCTION SPEEDS AND FLAGS
-
-
- REGISTER:
- One of the arithmetic registers - either word (AX, BX, CX,
- DX, SI, DI, BP, SP for word operations) or byte (AH, AL, BH,
- BL, CH, CL, DH or DL for byte operations).
-
- AX or AL:
- AX and AL are considered special on the 8086. Sometimes
- there is a special AX/AL form of the instruction, (such as
- for ADD). This form will be shorter. It will only be noted
- if it is FASTER (such as for MOV) or if it is the only form
- allowed. It will either say (AX/AL) or (AX only) depending
- on whether both words and bytes are allowed or only words
- are allowed. Though both multiplication and division require
- the use of the (AX/AL) register, the AX/AL is understood,
- and is not mentioned in the instruction.
-
- SEGREG:
- One of the 4 segment registers - CS, DS, ES or SS.
-
- MEMORY:
- Either a byte or word in memory. It may be addressed with
- any possible addressing mode, and the extra time needed to
- calculate the address in memory (EA - calculate Effective
- Address) is the following:
-
-
- ADDRESSING MODE CLOCKS (EA)
-
- (i) [bx], [si], [di], [bp] 5
- (ii) variable (+ constant) 6
- (iii) [bp+di] or [bx+si] 7
- [bp+si] or [bx+di] 8
- (iv) ([bx] or [si] or [di] or [bp]) + constant 9
- (v) ([bp+di] or [bx+si]) + constant 11
- ([bp+si] or [bx+di]) + constant 12
-
- EXTRA TIME:
- 1) WORDS WHICH ARE AT ODD ADDRESSES NEED 4 EXTRA CLOCKS.
- 2) A segment override adds 2 clocks to the instruction.
-
-
-
- FLAGS
-
- Following the instruction mnemonic is information in square
- brackets that indicates how the instruction effects the flags
- register. There are three possibilities
-
- 1) The instruction may alter the value of a flag in a
- particular way depending on the result. For AND, the sign,
- zero and parity flags [SZP] will be set according to the
- result.
-
-
-
-
-
- The PC Assembler Tutor xxxii
- ______________________
-
- 2) The instruction may set a flag to a specific number
- (either 1 or 0). For AND, the overflow flag and the carry
- flag are cleared [(OC=0)].
-
- 3) The instruction may unreliably alter a flag. That means
- that the instruction might change the flag, but that it
- gives you no information. This kind of flag cannot be
- trusted after this operation. For AND, the auxillary flag is
- unreliable [?A?].
-
- The information will always be displayed in this order. The flags
- which are reliably set according to the result will be listed
- first [SZP]. Next, any flags which are either set or cleared will
- be put inside parentheses followed by an equal sign followed by
- the value of the flag (all inside of the parentheses)
- [SZP,(OC=0)]. Finally, any unrelaible flags will be put between
- question marks [SZP,(OC=0),?A?]. Each part will be separated by
- commas. If no flags are changed by the instruction, the brackets
- will have [none] written between them.
-
- Each flags will be indicated by a single letter. The letters are:
-
- O overflow flag 0 = no overflow, 1 = overflow
-
- D direction flag direction of movement
- for string instructions
- 0 = upwards, 1 = downwards
-
- I interrupt enable 0 = no ints, 1 = ints o.k.
-
- T trap flag trap next instruction?
- 0 = no trap, 1 = trap
-
- S sign flag 0 = positive, 1 = negative
-
- Z zero flag 0 = non-zero, 1 = zero
-
- A auxillary flag carry out of bottom half
- register? 0 = no, 1 = yes
-
- P parity flag 0 = odd, 1 = even
-
- C carry flag 0 = no carry, 1 = carry
-
-
-
- ***************** THE INSTRUCTIONS *********************
-
- INSTRUCTION TIMING
-
- AAA [AC,?OSZP?] 4 clocks
-
- AAD [SZP,?OAC?] 60 clocks
-
- AAM [SZP,?OAC?] 83 clocks
-
- AAS [AC,?OSZP?] 4 clocks
-
-
-
-
- Appendix III - Speeds and Flag Settings xxxiii
- _______________________________________
-
-
- ADC [OSZAPC] see ADD
-
- ADD [OSZAPC] register, register 3
- register, memory 9 + EA
- memory, register 16 + EA
- register, constant 4
- memory, constant 17 + EA
-
- AND [SZP,(OC=0),?A?] see ADD
-
- CALL [none] near call 19
- far call 28
- near call (reg-ind) 16 {2}
- near call (mem-ind) 21 + EA
- far call (mem-ind) 37 + EA
-
- CBW [none] 2 clocks
-
- CLC [(C=0)] 2 clocks
-
- CLD [(D=0)] 2 clocks
-
- CLI [(I=0)] 2 clocks
-
- CMC [C] 2 clocks
-
- CMP [OSZAPC] register, register 3
- register, memory 9 + EA
- memory, register 9 + EA
- register, constant 4
- memory, constant 10 + EA
-
- CMPS [OSZAPC] 22 clocks
-
- CWD [none] 5 clocks
-
- DAA [SZAPC,?O?] 4 clocks
-
- DAS [SZAPC,?O?] 4 clocks
-
- DEC [OSZAP] word register 2
- byte register 3
- word/byte memory 15 + EA
-
- DIV [?OSZAPC?] byte register 80 - 90 {3}
- word register 144 - 162
- ____________________
-
- 2 These three last ones are indirect calls. They get the
- address of the subroutine from a register (reg-ind) or from
- memory (mem-ind).
-
- 3 The smaller the numbers, the faster this operation can be
- accomplished. This applies for signed and unsigned multiplication
- and division. They all show a range of values rather than a
- specific value.
-
-
-
-
- The PC Assembler Tutor xxxiv
- ______________________
-
- byte memory (86 - 96) + EA
- word memory (150 - 168) + EA
-
- ESC [none] memory 8 + EA
- coproc. register 2
- This only makes sense if you know
- about coprocessors.
-
- HLT [none] 2 clocks
-
- IDIV [none] byte register 101 - 112
- word register 165 - 184
- byte memory (107 - 118) + EA
- word memory (171 - 190) + EA
-
- IMUL [OC,?SZAP?] byte register 80 - 98
- word register 128 - 154
- byte memory (86 - 104) + EA
- word memory (134 - 160) + EA
-
- IN [none] (AX/AL), port# 10
- (AX/AL), dx 8
-
- INC [OSCAP] word register 2
- byte register 3
- word/byte memory 15 + EA
-
- INT [(IT=0) {4} ] 51 clocks {5}
-
- INTO [ {6} ] overflow 54
- no overflow 4
-
- IRET [ {7} ] 24 clocks
-
-
- J(condition) [none] This includes all conditional jumps
- (JAE, JZ, JNO, JLE, JP, etc.) with the
- exception of JCXZ.
- jump 16
- no jump 4
- ____________________
-
- 4 Although this sets the trap flag and interrupt flag to 0, it
- doesn't do it to YOUR flags, it does it to the flags that the
- interrupt will see. Your flags are safely stored on the stack and
- will return unaltered at the end of the interrupt.
-
- 5 There is one exception. INT 3, when coded as the single byte
- trap interrupt, is 52 clocks.
-
- 6 If there is overflow, it pushes your flags and sets (IT=0)
- for the interrupt. If there was no overflow, it does nothing. In
- either case your own flags will remain unaffected.
-
- 7 This puts the copy of your old flags back in the flags
- register. The flags will be the same as they were when you called
- the interrupt.
-
-
-
-
- Appendix III - Speeds and Flag Settings xxxv
- _______________________________________
-
-
- JCXZ [none] jump 18
- no jump 6
-
- JMP [none] same segment 15
- different segment 15
- near (reg-ind) 11 {8}
- near (mem-ind) 18 + EA
- far (mem-ind) 24 + EA
-
- LAHF [none] 4 clocks
-
- LDS [none] 16 + EA
-
- LES [none] 16 + EA
-
- LEA [none] 2 + EA
-
- LOCK [none] 2 clocks
-
- LODS [none] 12 clocks
-
- LOOP [none] jump 17
- no jump 5
-
- LOOPE/LOOPZ [none] jump 18
- no jump 6
-
- LOOPNE/LOOPNZ [none] jump 19
- no jump 5
-
- MOV [none] register, register 2
- register, memory 8 + EA
- memory, register 9 + EA
- register, constant 4
- memory, constant 10 + EA
- (AX/AL) <-> memory 10 {9}
-
-
-
- ____________________
-
- 8 These last three are indirect jumps. The information about
- where to jump to is coming from a register (reg-ind) or from
- memory (mem-ind).
-
- 9 This is a special instruction which moves a directly
- addressed variable to or from AX (or AL for bytes). Pointers are
- not allowed, only the forms:
-
- mov ax, variable1
- mov variable1, ax
-
- This takes 10 clocks instead of 14 or 15 for the other form.
- Whether this form gets used is up to the assembler, not you.
- Fortunately, MASM, TurboAssembler and A86 all use this form when
- appropriate.
-
-
-
-
- The PC Assembler Tutor xxxvi
- ______________________
-
- segreg <-> register 2 {10}
- segreg, memory 8 + EA
- memory, segreg 9 + EA
-
- MOVS [none] 11 clocks
-
- MUL [OC,?SZAP?] byte register 70 - 77
- word register 118 - 133
- byte memory (76 - 83) + EA
- word memory (124 - 139) + EA
-
- NEG [OSZAPC] {11} register 3
- memory 16 + EA
-
- NOP [none] 3 clocks
-
- NOT [none] register 3
- memory 16 + EA
-
- OR [SZP,(OC=0),?A?] see ADD
-
- OUT [none] (AX/AL), port# 10
- (AX/AL), dx 8
-
- POP [none] register 8
- segreg 8
- memory 17 + EA
-
- POPF [ {12} ] 8 clocks
-
- PUSH [none] register 11
- segreg 10
- memory 16 + EA
-
- PUSHF [none] 10 clocks
-
- RCL [OC] register by 1 bit 2
- memory by 1 bit 15 + EA
- register by # in CL 8 + (4 * #) {13}
- memory by # in CL 20 + EA + (4 * #)
-
- RCR [OC] see RCL
-
- ____________________
-
- 10 MOV, PUSH and POP are the only instructions that can alter
- the segment registers (other than CALLs and JMPs).
-
- 11 (NEG number) sets the flags the same as (SUB 0, number).
-
- 12 POPF resets the flags register by POPping a word of the
- stack and using the values stored in that word.
-
- 13 Thus, if you rotate right by 3 bits add (4 * 3), if you
- rotate by 7 bits add (4 * 7), if you rotate by 2 bits add
- (4 * 2). As you can see, this can cost a lot of time if you are
- rotating more than 3 or 4 bits.
-
-
-
-
- Appendix III - Speeds and Flag Settings xxxvii
- _______________________________________
-
- REP [none] 2 clocks
-
- RET [none] near ret 8 {14}
- near ret (#) 12
- far ret 18
- far ret (#) 17
-
- ROL [OC] see RCL
-
- ROR [OC] see RCL
-
- SAHF [ {15} ] 4 clocks
-
- SAL/SHL [OSZPC,?A?] see RCL
-
- SAR [OSZPC,?A?] see RCL
-
- SBB [OSZAPC] see ADD
-
- SCAS [OSZAPC] 15 clocks
-
- SEGMENT OVERRIDE [none] 2 clocks
-
- SHR [OSZPC,?A?] see RCL
-
- STC [(C=1)] 2 clocks
-
- STD [(D=1)] 2 clocks
-
- STI [(I=1)] 2 clocks
-
- STOS [none] 11 clocks
-
- SUB [OSZAPC] see ADD
-
- TEST [SZP,(OC=0),?A?] register, register 3
- register, memory 9 + EA
- memory, register 9 + EA
- (AX/AL), constant 4
- register, constant 5
- memory, constant 11 + EA
-
- WAIT [none] 3 clocks minimum, then check every 5
- clocks
-
- XCHG [none] register, register 4
- (AX only), register 3
- ____________________
-
- 14 The # here indicates that you pop things off the stack as
- you would in a Pascal program:
-
- ret (18)
- ret (6)
-
- 15 Alters the values of the SZAPC flags according to the
- values in the AH register.
-
-
-
-
- The PC Assembler Tutor xxxviii
- ______________________
-
- register, memory 17 + EA
-
- XLAT [none] 11 clocks
-
- XOR [SZP,(OC=0),?A?] see ADD
-
-